Table of Contents

1. Probability

1.1. Marginal Probability

  • Probability distribution of a subset of a larger collection of random variables.

1.2. Conditional Probability

  • Probability contingent upon the values of the other variables.
  • \[ p_{Y|X}(y\mid x) := \mathrm{P}[Y=y\mid X=x] = \frac{\mathrm{P}(\{X=x\}\cap \{Y=y\})}{\mathrm{P}(\{X=x\}} \]
  • \[ f_{Y|X}(y\mid x) = \frac{f_{X,Y}(x,y)}{f_X(x)} \]

1.3. Law of Total Probability

  • Relation between marginal probability and conditional probability.
  • \[ \mathrm{P}(A) = \sum_k \mathrm{P}(A\cap B_k) = \sum_k \mathrm{P}(A\mid B_k)\mathrm{P}(B_k) \]
  • \[ \mathrm{P}(A) = \int_\mathrm{R}\mathrm{P}(A\mid X=x)\,dF_X(x) \]
  • Further, \[ \mathrm{P}(A\mid B) = \sum_n \mathrm{P}(A \mid C_n)\,\mathrm{P}(C_n\mid B) \]

1.4. Bayesian Probability

  • Interpretation of the probability as reasonable expectation, instead of frequency or propensity.
  • It represents a state of knowledge or as quantification of a personal belief.

2. Probability Space

2.1. Definition

  • A ../calculus/Measure Theory.html#org4a45981 such that the measure of the whole space is one.
  • It is a triple \((\Omega, \Sigma, \mathrm{P})\) consists of
    • The sample space \(\Omega\): an arbitrary non-empty set,
    • The event space \(\Sigma\): ../calculus/Measure Theory.html#org39f4f7f on \(\Omega\),
      • \(\Sigma\) for sigma algebra. \(\mathcal{F}\) is also often used instead, for filtration. or \(\mathcal{A}\) by conventions.
    • The probability measure \(\mathrm{P}: \mathcal{F} \to [0, 1]\).

2.2. Probability Measure

2.2.1. Definition

  • Probability measure \(\mathrm{P}\) is a ../calculus/Measure Theory.html#org39ea87f over \(\Omega\), with:
    • \(\mathrm{P}: \sigma(\Omega) \to [0, 1]\), with \(\mathrm{P}(\varnothing) = 0\) and \(\mathrm{P}(\Omega) = 1\).
    • Countable Additivity: \[ \mathrm{P}\left(\bigcup_{i\in \mathbb{N}}E_i\right) = \sum_{i\in \mathbb{N}}\mathrm{P}({E_i}) \] where \(\{E_i\}\) are pairwise disjoint sets.
  • The validity of this definition of a probability measure is precisely given by the Kolmogorov axioms.1
    1. Non-negativity
    2. Unit measure
    3. σ-additivity

2.2.2. Notations

  • Probability that a random variable \(X\) takes a value in a measurable set \(S\subseteq E\) is written as \[ \mathrm{P}[X\in S] := \mathrm{P}(\{\omega \in \Omega\mid X(\omega)\in S\}). \]

3. Random Variable

4. Probability Distribution

  • Probability distribution forgets the probability space, and only remembers the output values of a random variable.

4.1. Properties

4.1.1. Mean

4.1.2. Variation

4.1.3. Skewness

  • 왜도

4.1.4. Kurtosis

  • 첨도

4.1.5. Absolutely Continuous

  • Probability function whose domain has infinite elements.
  • Random variable \(X\) is absolutely continuous, if there exists a function \(f_X\) such that for each interval \([a,b] \in \mathbb{R}\): \[ \mathrm{P}[a\le X \le b] = \int_a^b f_X(x)\,dx \]

4.2. Probability Mass Function

  • The probability mass function \(p_X(x)\) is defined as \[ p_X(x) := \mathrm{P}[X=x]. \]

4.3. Probability Density Function

4.3.1. Definition

4.3.2. Properties

  • For a real-valued random variable, it is absolutely continuous univariate distribution satisfying:
    • \[ f_X(x) = \frac{dF_X}{dx}, \]

4.4. Cumulative Probability Function

  • The cumulative probability function of a real-valued random variable is \[ F_X(x) := \mathrm{P}[X\le x]. \]

4.5. Normalization and Denormalization

  • The random variable \(X\) is contravariant, and the underlying probability function is covariant:
    • \[ Z = \frac{X - \mu}{\sigma} \]
    • \[ f_Z(x) = \sigma f_X(\sigma x + \mu) \]
      • The \(\sigma\) factor is the normalization constant, to compensate for \(f_X\) being scaled down in the \(x\) direction by the factor of \(\sigma\).
  • It is the inverse of the normalization:
    • \[ X = \sigma Z + \mu \]
    • \[ f_X(x) = \frac{1}{\sigma}f_Z\left(\frac{x - \mu}{\sigma}\right) \]
  • The random variable and probability distribution transforms oppositely.

4.6. Combination

4.6.1. Distribution of Sum

4.6.2. Product Distribution

4.6.3. Ratio Distribution

4.7. Instances

  • The unexpected probability result confusing everyone - YouTube
    • For independent uniformly distributed random variables \(X\) and \(Y\), the probability distribution of the \(\max(X,Y)\) is equal to the probability distribution of \(\sqrt{X}\).
    • Similarly, the probability distribution of the \(\max(X_1, X_2,\dots, X_n)\) is equal to the probability distribution of \(\sqrt[n]{X_1}\)
    • Shockingly, the probability distribution of the \(XY^Z\) is uniform, for the independent uniformly distributed random variables \(X,Y,Z\).

How to Learn Probability Distributions - YouTube

4.7.1. Discrete Distributions

4.7.1.1. Bernoulli Distribution
4.7.1.2. Binomial Distribution
4.7.1.3. Multinomial Distribution
4.7.1.4. Poisson Distribution
4.7.1.5. Geometric Distribution

4.7.2. Continuous Distributions

4.7.2.1. Normal Distribution
  • Gaussian Distribution
  • \(\mathcal{N}(\mu, \sigma^2)\)
4.7.2.1.1. Probability Density Function
  • \[ f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \]
4.7.2.1.1.1. Derivation
  • Normalizing the area of the Gaussian function: \[ e^{-x^2} \rightsquigarrow \frac{1}{\sqrt{\pi}}e^{-x^2} \]
    • This was the definition of the standard normal by Carl Friedrich Gauss
    • It has standard deviation of \(1/\sqrt{2}\).
  • Denormalizing the probability distribution to mean \(\mu\) and standard deviation \(\sigma\): \[ \frac{1}{\sqrt{\pi}}e^{-x^2} \rightsquigarrow \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}. \]
4.7.2.2. Chi-Squared Distribution
4.7.2.2.1. Definition
  • For independent, standard normal random variables \(Z_1, \dots, Z_k\), \[ Q = \sum_{i=1}^k Z_i^2 \] is distributed according to the chi-squared distribution with \(k\) degrees of freedom: \[ Q \sim \chi^2(k). \]
  • This distribution arises in least square method.
  • Chi-squared distribution - Wikipedia
4.7.2.3. F-Distribution
  • ((66cc6b5d-2778-4bfb-9840-7a4d91f147df))
4.7.2.4. Student's T-Distribution
  • T-Distribution
  • The name is from William Sealy Gossett.
  • It has fat tails. The shape of the t-distribution approaches the standard 4.7.2.1, as the sample size increases.
  • It is a parametric family \(t_{\rm DF}\) with respect to the degrees of freedom which is directly related to the sample size.
4.7.2.5. Cauchy Distribution
  • Lorentz Distribution, Cauchy-Lorentz Distribution, Lorentzian Function, Breit-Wigner Distribution
  • \[ f(x; x_0, \gamma) = \frac{1}{\pi\gamma\left[1+\left(\frac{x-x_0}{\gamma}\right)^2\right]}. \]
  • Its mean is undefined.
4.7.2.6. Exponential Distribution
  • Negative Exponential Distribution
  • In terms of rate \(\lambda\):
    • \[ f(x;\lambda) = \lambda e^{-\lambda x} \]
    • with \(f(x;\lambda) = 0\) if \(x<0\).
  • In terms of scale parameter \(\beta = 1/\lambda\):
    • \[ f(x;\beta) = \frac1\beta e^{-x/\beta} \]
  • The distance between consecutive events in a Poisson point process.
  • Continuous Geometric Distribution
4.7.2.7. Beta Distribution
4.7.2.7.1. Defintion
  • \[ f(x; \alpha, \beta) := \frac{x^{\alpha-1}(1-x)^{\beta-1}}{\mathrm{B}(\alpha, \beta)} \]
    • where \(\mathrm{B}(\alpha,\beta)\) is the .
4.7.2.7.2. Properties
  • It is the probability distribution of the estimator \(\hat{p}\) of the probability of observing a positive event after observing \(\alpha-1\) positive events and \(\beta-1\) negative events.
    • \[ \hat{p} = \frac{\alpha-1}{\alpha + \beta-2} \sim \mathcal{Be}(\alpha, \beta) \]
    • \[ \mathcal{Be}(\alpha, \beta) = \mathrm{P}[X_{\alpha+\beta-1} = \omega_+ \mid X_{\sigma(1)} = \cdots = X_{\sigma(\alpha-1)} = \omega_+, X_{\sigma(\alpha)} =\cdots = X_{\sigma(\alpha+\beta - 2)} = \omega_-] \]
    • where \(X_i\)s are independent and identically distributed (iid) random variables, with unknown probability distribution.
  • The ./Statistics.html#org3f99914 is symmetric: \[ H_X = H_{1-X} \]
    • where \(H_X\) is defined to be \[ H_X := \frac{1}{\mathrm{E}\left[\frac{1}{X}\right]} \]
  • Concentration \(\kappa := \alpha+\beta\)
  • mode \[ \omega = \frac{\alpha-1}{\alpha+\beta -2} \]
  • Variance \[ \sigma^2 = \frac{\alpha\beta}{(\alpha + \beta)^2(\alpha+\beta +1)} \]
    • It is asymptotically equal to the variance of the sample mean \(\bar{x}\) of random variables distributed as 4.7.1.1. \[ \frac{\hat{p}\hat{q}}{n} = \widehat{\sigma^2[\hat{p}]}, \quad\sigma^2[\bar{x}]= \sigma^2[\hat{p}] \]
4.7.2.7.3. Beta Prime Distribution
  • The probability distribution of the estimator of odds.

5. Parametric Family

6. Stochastic Process

6.1. Properties

  • A stochastic process \( (X_t)_{t\in \mathbb{T} \) on the probability space \( (\Omega, \Sigma, \mathrm{P}) \) generates the natural filtration \( (\mathcal{F}_t)_{t\in \mathbb{T}} \) of \( \Sigma \): \[ \mathcal{F}_t := \sigma(X_k \mid k \le t). \]
    • The probability space is the product space of all domains of \( (X_t)_{t\in\mathbb{T}} \).

7. Filtration

7.1. Defintion

  • Filtration of a structure \(S\) is the totally ordered collection of substructures.
  • \((\mathcal{F}_i)_{i\in I}\) such that \(i\le j\implies \mathcal{F}_i \subseteq \mathcal{F}_j \subseteq S\).

8. References

Footnotes:

Created: 2025-05-06 Tue 23:35